Minimum Bayes Risk Acoustic Model Estimation and Adaptation
نویسنده
چکیده
Modern automatic speech recognition (ASR) systems use statistical models of spoken language. These models are typically learned from corpora comprising many hours of transcribed speech. While a variety of machine learning approaches have been applied to this learning task, the optimal learning strategy is unknown. This thesis focusses upon a relatively recent and successful approach, the application of the principle of minimum Bayes risk (MBR) to the estimation and adaptation of acoustic models used in ASR. The aim of the research is to address issues pertaining to the theory, implementation, understanding and performance of MBR acoustic model estimation and adaptation. The first confronted issue is related to the optimisation of the MBR criterion function in the context of continuous density hidden Markov models (HMMs). Iterative update formulae known as the extended Baum-Welch (EBW) equations are generally used to estimate the parameters of the state output distributions of such HMMs such that the MBR criterion is minimised. Previous justifications of the EBW equations have failed to both guarantee a decrease in the MBR criterion with each iteration and to specify a value of the learning rate constant used in these equations. In this thesis, an auxiliary function for the MBR criterion is presented. Via this auxiliary function, the EBW update equations are derived, and a minimum value for the learning rate constant of these equations is calculated. The second issue addressed by the thesis concerns the approximation of errors within the implementation of MBR acoustic parameter estimation. Limitations of previously introduced error approximation methods are explained. An alternative error approximation technique which addresses these limitations is presented. Incorporation of this novel error approximation technique yields acoustic models which display significant classification performance improvements over models estimated via previously introduced error approximation methods. The third issue pertains to the formulation of the MBR criterion, which may be defined using words, phonemes or other sub-word units. The phoneme-level MBR formulation is known as the minimum phone error (MPE) criterion. Previous research has observed small improvements in classification performance when using MPE-estimated acoustic models in place of word-level MBRestimated acoustic models. This effect is poorly understood. Theoretical arguments and experimental evidence are presented which lend insight into this phenomenon. Additionally, alternative sub-word MBR formulations are proposed, motivated and experimentally evaluated. The fourth and last issue addressed by this thesis is the performance of acoustic models adapted using unsupervised MBR-based linear regression (MBRLR) adaptation. The theory and implementation of MBRLR acoustic model adaptation is extended by incorporating confidence information into the MBR criterion. This refinement is shown to yield significant classification performance improvements when compared experimentally with standard unsupervised MBRLR adaptation. xiii
منابع مشابه
Bayes, E-Bayes and Robust Bayes Premium Estimation and Prediction under the Squared Log Error Loss Function
In risk analysis based on Bayesian framework, premium calculation requires specification of a prior distribution for the risk parameter in the heterogeneous portfolio. When the prior knowledge is vague, the E-Bayesian and robust Bayesian analysis can be used to handle the uncertainty in specifying the prior distribution by considering a class of priors instead of a single prior. In th...
متن کاملTemporal masking for unsupervised minimum Bayes risk speaker adaptation
The minimum Bayes risk (MBR) criterion has previously been applied to the task of speaker adaptation in large vocabulary continuous speech recognition. The success of unsupervised MBR speaker adaptation, however, has been limited by the accuracy of the estimated transcription of the acoustic data. This paper addresses this issue not by improving the accuracy of the estimated transcription but v...
متن کاملEstimation of Scale Parameter Under a Bounded Loss Function
The quadratic loss function has been used by decision-theoretic statisticians and economists for many years. In this paper the estimation of scale parameter under a bounded loss function, which is adequate for assessing quality and quality improvement, is considered with restriction to the principles of invariance and risk unbiasedness. An implicit form of minimum risk scale equivariant ...
متن کاملGinisupport vector machines for segmental minimum Bayes risk decoding of continuous speech
We describe the use of Support Vector Machines (SVMs) for continuous speech recognition by incorporating them in Segmental Minimum Bayes Risk decoding. Lattice cutting is used to convert the Automatic Speech Recognition search space into sequences of smaller recognition problems. SVMs are then trained as discriminative models over each of these problems and used in a rescoring framework. We pos...
متن کاملAcoustic model adaptation based on coarse/fine training of transfer vectors and its application to a speaker adaptation task
In this paper, we propose a novel adaptation technique based on coarse/fine training of transfer vectors. We focus on transfer vector estimation of a Gaussian mean from an initial model to an adapted model. The transfer vector is decomposed into a direction vector and a scaling factor. By using tied-Gaussian class (coarse class) estimation for the direction vector, and by using individual Gauss...
متن کامل